Inversion start site 
I 

AT T ATAAAGGAAAAAGAAAATAAC GCAAT GGACAAGTGGTG 

360 + + + + + 900 

TAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC (41) 
YKGKRK* RNGQVV 



AAGCT GT GAACT CAGGT GT GCACAATT AT CAGGAACAC C C CAAAAC CAAAGT GAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT (101) 
KL*TQVCTIIRNTPKPK*GR 



AATAGCAT GAGAAGCCGT GTT T GAT GTT AAT TAA T T 

961 + + + 996 

TT AT C GT ACT CTT C GGCACAAACT ACAATTAAT TAA ( 137 ) 
NSMRSRV*C*LI 



The inversion sequence of the apo-dystrophin-4 cDNA (SEQ ID NO 1) 



Figure 1 



Inversion start site 
I 

TAAAGAAAGAATTATAAAGGAAAAAGAAAATAAC GCAAT GGACAAGT GGTG 

850 + + + + + + 900 

ATTTCTTTCTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC (51) 
*RKNYKGKRK*RNGQVV 



AAGCT GT GAACT CAGGT GT GCACAATT AT CAGGAACAC C C CAAAAC CAAAGT GAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT (111) 
KL*TQVCTIIRNTPKPK*GR 



AATAGCAT GAGAAGCC GT GT TT GAT GTT AATTAA TT 

961 + + + 996 

T TAT C GTACT CTT C GGCACAAACTACAAT TAAT TAA ( 147 ) 
NSMRSRV*C*LI 



The inversion sequence of the apo-dystrophin-4 cDNA plus a 10 base-pair region 5> to the 
start of the inversion sequence (SEQ ID NO 1A). 



Figure 1A 



Start at 710 
I 

AACAAT GGCAG 

+ + 720 

TTGTTACCGTC (11) 
Q W Q 



GT TT T ACAC GT CT AT GCAAT T GT ACAAAAAAGT T AT AAGAAAACT ACAT GT AAAAT CT T G 

721 + + + + + + 780 

CAAAAT GT GCAGAT AC GTTAACAT GT TTT T T CAAT AT T CT TT T GAT GT ACAT TT T AGAAC (71) 
VLHVYAIVQKSYKKTTCKIL 



ATAGCTAAATAACTTGCCATTTCTTTATATGGAACGCATTTTGGGTTGTTTAAAAATTTA 

781 + + + + + + 840 

TAT C GATT TAT TGAACGGTAAAGAAATATACCTT GCGTAAAACCCAACAAATTTTTAAAT ( 131 ) 
I A K * L A I SLYGTHFGLFKNL 
inversion start site 
i 

TAACAGTTATAAAGAAAGAATTATAAAGGAAAAAGAAAATAACGCAAT GGACAAGTGGT G 

841 + + + + + + 900 

ATTGTCAATATTTCTTTCTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC (191) 
*QL*RKNYKGKRK*RNGQVV 



AAGCT GT GAACT CAGGT GT GCACAATT AT CAGGAACACC C CAAAAC CAAAGT GAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT (251) 
KL*TQVCTIIRNTPKPK*GR 



AAT AGCAT GAGAAGC C GT GT TT GAT GT T AATTAA T T 

961 + + + 996 

T TAT C GT ACT CT T C GGCACAAACTACAAT TAAT TAA (287) 
NSMRSRV*C*LI 



The inversion sequence of the apo-dystrophin-4 cDNA plus the upstream 150 bp 
from the start of the inversion at 860 to the Hpa I enzyme site (SEQ ID NO IB) 



Figure IB 



GT G GT T T GAT T GATAGTAAAAAAAAT GTT C GTT AAT ACAAGTAGAGAGTAAGTAAT CAAT 

I + + + + + + 60 

CAC CAAACTAACTAT CAT T TT T T TT ACAAGCAATT AT GTT CAT CT CT CAT T CATT AGT T A 
VV*LIVKKMFVNTSRE*VIN 

CAATCACTCATAGCCAAGGT GGAAAAGATGTAT CCCATCAT GGAATATTCCTGTT CTGAT 

61 + + + + + + 120 

GT TAGT GAG TAT C GGT T C CAC CT TTT CT ACAT AGGGT AGTAC CTT ATAAGGACAAGACT A 
QSLIAKVEKMYPIMEYSCSD 

AGAAAT CT T GT GCTT AT CT AT GGAATT CT T T T GATAT ATAT TT ACATT GGGAACCT GAAT 

121 + + + + + + 180 

T CT T TAGAACAC GAATAGATACCT TAAGAAAAC TAT ATAT AAAT GTAAC CCTT GGACT T A 
RNLVLIYGILLIYIYIGNLN 

GTAGCTTGACATTTTT CCATGTAAACACCAGTAGCCT GATCCAACATTAAGCT GAT ACTA 

181 + + + + + + 240 

CAT C GAACT GTAAAAAGGTACATTT GT GGTCAT CGGACTAGGTTGTAATTCGACTATGAT 
VA*HFSM*TPVA*SNIKLIL 

ACAAACAACGTGTAATGGCTTCATTAATAAGGCTTTGCTTCTTCCTGGAAACTGGTGA7^A 

241 + + + + + + 300 

T GTTT GTT GCAC ATT ACC GAAGTAAT T ATT CCGAAAC GAAGAAG GAC CT TT GACCACT TT 
TNNV*WLH* * GFAS SWKLVK 

AAT CAAAC CTTGTT GT GTACACC CT CGAT GCAGCT T CT GT GT T GT CTT CAC C CAGAAAT G 

301 + + + + + + 360 

TTAGTTTGGAACAACACAT GTGGGAGCTACGTCGAAGACACAACAGAAGTGGGT CTTTAC 
NQTLLCTPSMQLLCCLHPEM 



The polynucleotide sequence of apo-dystrophin-4 (SEQ ID NO 2) 



Figure 2 



GGGAAT GATTT CCCAAAT G GCAAAGAAACAGAGT GAT GCT AT CT AT CT GCAC CT TTT GTA 

361 + + + + + + 420 

C C CTTAC TAAAGGGTT T AC CGTT T CTTT GT CT CACT ACGAT AGAT AGAC GT GGAAAACAT 
GNDFPNGKETE*CYLSAPFV 

AAGT CTGT CTTT CTTT CT CTTT GTTTTCCAGGACACAAT GTAGGAAGT CTTTT CCACAT G 

421 + + + + + + 480 

TT CAGACAGAAAGAAAGAGAAACAAAAGGT CCTGTGTTACATC CTTCAGAAAAGGT GTAC 
KSVFLSLCFPGHNVGSLFHM 

GCAGATGATTTGGGCAGAGCGATGGAGTCCTTAGTATCAGTCAT GACAGAT GAAGAAGGA 

481 + + + + + + 540 

CGTCTACTAAACCCGTCTCGCTACCTCAGGAATCATAGTCAGTACTGTCTACTTCTTCCT 
ADDLGRAMES LVSVMTDEEG 

GCAGAATAAAT GTTTTACAACT CCT GATT C CC GCATGGTTTTT ATAATAT TCATACAACA 

541 + + + + + + 600 

CGTCTTATTTACAAAAT GTTGAGGACTAAGGGCGTACCAAAAATATTATAAGTAT GTT GT 
AE*MFYNS * FPHGFYNIHTT 

AAGAGGATTAGACAGTAAGAGTTT ACAAGAAATAAAT CT ATAT T T TTGT GAAGGGT AGT G 

601 + + + + + + 660 

T TCT C CTAAT CT GT CAT T CT CAAAT GTT CTTTATTTAGAT ATAAAAACACTT C C CAT CAC 
KRI RQ* EFTRNKS I FL* RVV 

GTATTAT ACT GT AGAT TT CAGT AGTTTCTAAGT CT GT TATTGT TTT GTTAACAAT GGCAG 

661 + + + + + + 720 

CATAATATGACATCTAAAGTCATC7V7VAGATTCAGACAAT7VACAAAACAATTGTTACCGTC 
VLYCRFQ* FLSLLLFC*QWQ 



Figure 2 (cont'd) 



GT TT TACAC GT CTAT GCAATT GTACAAAAAAGTT ATAAGAAAACTACAT GTAAAAT CTT G 

721 + + + + + + 780 

CAAAAT GTGCAGAT AC GTTAACAT GTT T TT T CAATAT T CT T TT GAT GTACAT T T T AGAAC 
VLHVYAIVQKSYKKTTCKIL 

ATAGCTAAATAACTTGCCATTTCTTTATATGGAACGCATTTTGGGTTGTTTAAAAATTTA 

781 + + + + + + 840 

TAT C GATTT ATT GAACGGT AAAGAAATAT ACCTT GC GTAAAACCCAACAAAT TT T TAAAT 
I A K * LAI SLYGTHFGLFKNL 

TAACAGTTAT/AAAGAAAGAjAT TATAAAGGAAAAAGAAAATAAC GCAAT GGACAAGT GGT G 

841 + + -f 4- + + 900 

ATTGTCAATATTTCTTTCTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC 
* QL* RKNYKGKRK* RNGQVV 

AAGCT GT GAACTCAGGT GTGCACAATTATCAGGAACACCCCAAAACCAAAGTGAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT 
KL*TQVCTIIRNTPKPK*GR 



AATAGCAT GAGAAGC CGT GT TT GAT GT TAATTAATT 

961 + + + 996 

TT AT CGT ACT CTT CGGCACAAACT ACAAT TAATTAA 
NSMRSRV*C*LI 



Figure 2 (cont'd) 



M12345 6789 10 11 12 M 




1Kb 



P 
|1 
Pi 

c 
PJ 

03 
P 



1Kb 



M 13 14 15 16 17 18 19 20 21 23 2 4 M 



Figure 3A 



ab cd e f g h i j k 1 ronp pq r s t.uv w 




-1Kb 




Figure 3B 




Figure 4A Figure 4B 




Figure 4C 




1 2 3 




lOOKd 

69-70Kd 

55Kd 
46Kd 



K562 COS Transfectants 



Figure 5 



ill" ' n up 'ii if nt 



. . . T AGT T T C CT AT T CAAT GT ATAGT GCAC CAAAGGT CAAT T CAAGAGTT TAT TAT TAT T 

_239 + + + + + + " 18 ° 

. . . AT CAAAGGATAAGTT ACAT AT CAC GT GGT TT CCAGTTAAGTT CT CAAATAATAATAA 
. *FPIQCIVHQRSIQEFIII 

AT TTT CAACC CAAGTAAAAGCAGAGAGAAAATAGCCACCTCCACCATAGCCT CAGAAGCA 

-179 + + + + + + "120 

TAAAAGTTGGGTTCATTTTCGTCTCTCTTTTATCGGTGGAGGTGGTATCGGAGTCTTCGT 
IFNPSKSREKIATSTIASEA 

AGCCAACAGC CT GAAACAGCT TT GAAAT GAAAAGTT GGTGT GGCGGT GAT GGT GGCAGTG 

-119 + + + + + + " 60 

T CGGT T GTCGGACTTTGTCGAAACTT TACT TTT CAACCACACCGCCACTACCACCGT CAC 

SQQPETALK* KVGVAVMVAV 

ATAAT GGT GACCGAT GGTT GGGT GCT GGT GAT GGT AGT GGTAGTTGT GAAGGT GGT GATG 

_ 59 + + + + + + 0 

TAT TAC CACT GGCT AC CAACCCACGACCACT AC CAT CAC CAT CAACACTT C CAC CACT AC 
IMVTDGWVLVMVVVVVKVVM 

GT GGTTT GATTGATAGTAA7\AAAAATGTTCGTTAATACAAGTAGAGAGTAAGTAAT CAAT 

1 + + + + + + 60 

CAC CAAACTAACTAT CAT T TTT TT T ACAAGCAAT TAT GTT CAT CT CT CATT CAT T AGTT A 
VV*LIVKKMFVNTSRE*VIN 

CAAT CACT CAT AGCCAAGGTGGAAAAGAT GTAT CCCAT CATGGAATATT CCTGTTCT GAT 

61 + + + + + + 120 

GTTAGT GAGTAT CGGTT CCACCTTTT CTACATAGGGTAGTACCTTATAAGGACAAGACTA 
QS LIAKVEKMYP IMEYS CS D 

AGAAAT CT T GT GCT TAT CT AT GGAAT T CT T T T GATAT AT AT TT ACAT T GGGAAC CT GAAT 

121 1 i 1 h 1 H 180 

T CT T T AGAACAC GAAT AGAT AC CTTAAGAAAACT AT AT ATAAAT GTAACC CT T GGACT T A 
RNLVLIYGI LLIYIYIGNLN 

GTAGCTTGACATTTTTCCAT GTAAACACCAGTAGCCT GAT CCAACATTAAGCT GATACTA 

181 + + + + + + 240 

CAT CGAACTGTAAAAAGGTACATTT GTGGTCAT CGGACTAGGTTGTAATT CGACTATGAT 
VA*HFSM*TPVA*SNIKLIL 

ACAAACAACGTGTAATGGCTTCATTAATAAGGCTTTGCTTCTTCCTGGAAACTGGTGAAA 

241 + + + + + + 300 

TGTTTGTTGCACATTACCGAAGTAATTATTCCGAAACGAAGAAGGACCTTTGACCACTTT 
TNNV*WLH* * GFAS SWKLVK 

AAT CAAAC CT T GT T GT GT ACACCCT CGAT GCAGCTT CT GT GT T GT CTT CAC CCAGAAAT G 

301 + + + + + + 360 

T TAGTT T GGAACAACACAT GT GGGAGCT AC GTC GAAGACACAACAGAAGT GGGT CTT TAC 
NQTLLCTPSMQLLCCLHPEM 

GGGAAT GATTT CCCAAAT GGCAAAGAAACAGAGT GATGCTATCTAT CT GCACCTTTTGTA 

361 + + + + + + 420 

C CCT T ACTAAAGGGT T TAC CGTTT CTTT GT CT CACT AC GATAGAT AGACGT GGAAAACAT 
GNDFPNGKETE* CYLSAPFV 



Figure 6 



begin exon 79 
I 

AAGTCTGTCTTTCTTTCTCTTTGTTTTCC AGGA CACAATGTAGGAAGTCTTTTCCACATG 

421 + + + + + + 480 

T T CAGACAGAAAGAAAGAGAAACAAAAGGT C CT GT GTTACAT C CTT CAGAAAAGGT GT AC 
KSVFLSLCFPGHNVGSLFHM 

GCAGAT GATTTGGGCAGAGCGATGGAGT CCTTAGTATCAGTCATGACAGATGAAGAAGGA 

481 + + + + + + 540 

CGT CT ACTAAACC CGT CT CGCT ACCT CAGGAAT CATAGTCAGTACT GT CT ACT TCT T C CT 
ADDLGRAMES LVSVMTDEEG 

GCAGAATAAATGTTTTACAACT CCTGATT CCCGCAT GGTTTTTATAATATT CATACAACA 

541 + + + + + + 600 

CGT CTTATTTACAAAAT GTT GAGGACTAAGGGCGTACCAAAAATATTATAAGTATGTTGT 
AE*MFYNS* FPHGFYNIHTT 

{ N ) 

AAGAGGAT T AGACAGTAAGAGTT T ACAAGAAATAAAT CT AT AT TTT T GT GAAGGGT AGT G 

601 + + + + + + 660 

T T CT CCT AAT CT GT CATT CT CAAAT GT T CTTT AT T T AGATATAAAAACACT T C C CAT CAC 
KRI R Q * EFTRNKSI F L * R V V 

GTAT TAT ACTGT AGAT TT CAGT AGTTT CTAAGT CT GTT ATT GTT TT GT TAACAATGGCAG 

661 + + + + + + 720 

CAT AAT AT G ACAT CT AAAGT CAT CAAAGAT T C AGACAATAACAAAACAAT T GT T AC C GT C 
VLYCRFQ* FLSLLLFC*QWQ 

GT TT TACACGTCTATGCAATTGTACAAAAAAGTTATAAGAAAACTACAT GTAAAATCTTG 

721 + + + + + + 780 

CAAAAT GT GCAGAT AC GT TAACAT GT TTT TT CAAT ATT CTT T T GAT GTACAT TT T AGAAC 
VLHVYAIVQKSYKKTTCKI L 

ATAGCTAAATAACTTGCCATTTCTTTATATGGAACGCATTTTGGGTTGTTTAAT^AATTTA 

781 + + + + + + 840 

T AT CGAT T TATT GAACGGTAAAGAAAT AT ACCT T GCGTAAAAC CCAACAAAT TTTTAAAT 
I A K * LAI SLYGTHFGLFKNL 
inversion start site 
I 

TAACAGTTATAAAGAAAGAATTATAAAGGAAAAAGAAAATAACGCAAT GGACAAGT GGT G 

841 + + + + + + 900 

ATTGT CAAT ATTT CTTT CTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC 
*QL*RKNYKGKRK*RNGQVV 

AAGCT GT GAACT CAGGT GT GCACAATTAT CAGGAACAC CC CAAAACCAAAGT GAGGTAGA 

901 + + + + + + 960 

T T C GACACT T GAGT C CACACGT GT TAAT AGT CCT TGTGGGGTTTT GGT T T CACT C CAT CT 
KL*TQVCTIIRNTPKPK*GR 

AATAGCATGAGAAGCCGTGTTTGATGTTAATTAATT 

961 + + + 996 

T TAT CGT ACT CTT CGGCACAAACT ACAAT TAAT TAA 
NSMRSRV*C*LI 



Figure 6 (cont'd) 



1.62 Kb deletion .657 Kb deletion 



451bp 




apo-4cDNA exon79 of dystrophin, 



860bp : 



137bp 



inversion 



860bp 



2.414Kb 

genomic DNA exon 79 of dystrophin to end of 

dystrophin 
3.265Kb of 
3'UTR 



Figure 7 



exon 78 



Hind 111 rs mxu H,ndl 

| 3.0Kb | 



Sail 
l_ 



exon 79 



5.9Kb 



Hind III 



7.8Kb 



Hind 



3' dystrophin genomic DNA 
dystrophin YAC clone DNA 



Sail 



400bp 




J dystrophin phage clone DNA 

U34) 

apo-dystrophin cDNA 



inverted segment 
map location 
137bp 



1.62 Kb 



*cDNA map is not precisely drawn to scale 



Figure 8 



50 1 

Mgenl073 

Hapol234 ctagtttcct attcaatgta tagtgcacca aaggtcaatt caagagttta 

Consensus 

51 100 

Mgenl073 

Hapol234 ttattattat tttcaaccca agtaaaagca gagagaaaat agccacctcc 

Consensus 

101 begin GRAIL exon @149 150 

Mgenl073 ttcACAGgCT tAAgCAGCca gtAAATGAcA 

Hapol234 accatagcct cagaagcaag ccaACAGcCT gAAaCAGCtt tgAAATGAaA 
Consensus ACAG-CT -AA-CAGC AAATGA-A 

151 200 

Mgenl073 AtT T AtgtGgtAgt cAgGtcactG 

Hapol234 AgTtggtgtg gcggtgatgg tggcagtgaT AatgGtgAcc gAtGgttggG 
Consensus A-T T A G— A A-G G 

201 apo-4 5' end 250 

Mgenl073 TGCTGGTaAT GGTgaTctTA GcaGgcAgAG aaGGTGgTaG TGaTTTGATa 
Hapol234 TGCTGGTgAT GGTagTggTA GttGtgA.AG gtGGTGaT gG_TGgTT TGAT t 
Consensus TGCTGGT-AT GGT--T — TA G — G — A- AG — GGTG-T-G TG-TTTGAT- 

251 Ml 300 

Mgenl073 GtaAaagtgt AgAcTaTaCa acAgaAtAAa TAcAagtatA GTAA 

Hapol234 GatAgtaaaa AaAaTgTtCg ttAatAcAAg TAgAgagtaA GTAAtcaatc 
Consensus G--A A-A-T-T-C A--A-AA- TA-A A GTAA 

301 M2 M3 350 

Mgenl073 ate caaCAAaGTG tgAAAGgTGT gTgCCATtAc acAtctTTCt 

Hapol234 aatcactcat agcCAAgGTG gaAAAGaTGT aTcCCATc At g gAataTTCc 

Consensus CAA-GTG — AAAG-TGT -T-CCAT-A A TTC- 

351 400 

Mgenl073 cG GtgATaagag cCTTgTCTAT GaAgTTC... TGAgATgTgT 

Hapol234 tGttctgata GaaATcttgt gCTTaTCTAT GgAaTTCttt TGAtATaTaT 

Consensus -G G- -AT CTT-TCTAT G-A-TTC TGA-AT-T-T 

401 450 

Mgenl073 TaggAagatG AAtCatcAat TtaCaT TTcTcCCcat cAAAtgaCAc 

Hapol234 TtacAttggG AAcCtgaAtg TagCtTgaca TTtTtCCatg tAAAcacCAg 
Consensus T A G AA-C A— T--C-T TT-T-CC AAA CA- 

451 begin mouse GRAIL exon 500 

Mgenl073 cAtgCTGATC CAgtATTAAG CTaATACTAA C ACca tgcAatGCTT 

Hapol234 tAgcCTGATC CAacAT TAAG CTgATACTAA CaaacaACgt gtaAtgGCTT 
Consensus -A — CTGATC CA — AT TAAG CT-ATACTAA C AC A — GCTT 

501 550 
Mgenl073 CATTAAcAAG GaTTTGCTTC TTgCTaGAAA tgGGT . .AAA AaCggACtgT 
Hapol234 CATTAAtAAG GcTTTGCTTC TTcCTgGAAA ctGGTgaAAA AtCaaACctT 
Consensus CAT TAA- AAG G-TTTGCTTC TT-CT-GAAA — GGT — AAA A-C — AC — T 



551 600 
Mgenl073 GgTcTGTAtA CCtTCaATGC AGCTTaTGTG TTGTCTTttC CtgAAatG 
Hapol234 GtTgTGTAcA CCcTCgATGC AGCTTcTGTG TTGTCTTcaC CcagaAAtgG 
Consensus G-T-TGTA-A CC-TC-ATGC AGCTT-TGTG TTGTCTT — C C AA — G 



Figure 11 



601 650 

Mgenl073 GtAA.TGA.cTc CCaAtAgtGg cAAccAgggG tacaATaCT TGCA 

Hapol234 GgAATGAtTt CCcAaAtgGc aAAgaAacaG agtgATgCTa tctatcTGCA 
Consensus G-AATGA-T- CC-A-A — G- -AA — A G AT-CT TGCA 

651 exon79 700 

Mgenl073 CacTTTGTAA A cTCTT TCTTTCTCTT TGTTTTCCAG GACACAATGT 

Hapol234 CctTTTGTAA AgtctgTCTT TCTTTCTCTT TGTTTTCCAG GACACAATGT 
Consensus C — TTTGTAA A TCTT TCTTTCTCTT TGTTTTCCAG GACACAATGT 

701 750 
Mgenl073 AGGAAGcCTT TTCCACATGG CAGATGATTT GGGCAGAGCG ATGGAGTCCT 
Hapol234 AGGAAGtCTT TTCCACATGG CAGATGATTT GGGCAGAGCG ATGGAGTCCT 
Consensus AGGAAG-CTT TTCCACATGG CAGATGATTT GGGCAGAGCG ATGGAGTCCT 

751 800 

Mgenl073 TAGTtTCAGT CATGACAGAT GAAGAAGGAG CAGAATAAAT GTTTTACAAC 

Hapol234 TAGTaTCAGT CATGACAGAT GAAGAAGGAG CAGAATAAAT GTTTTACAAC 

Consensus TAGT-TCAGT CATGACAGAT GAAGAAGGAG CAGAATAAAT GTTTTACAAC 

801 850 

Mgenl073 TCCTGATTCC CGCATGGTTT TTATAATATT CgTACAACAA AGAGGATTAG 

Hapol234 TCCTGATTCC CGCATGGTTT TTATAATATT CaTACAACAA AGAGGATTAG 

Consensus TCCTGATTCC CGCATGGTTT TTATAATATT C-TACAACAA AGAGGATTAG 

851 900 
Mgenl073 ACAGTAAGAG TTTACAAGAA ATaAAATCTA TATTTTTGTG AAGGGTAGTG 
Hapol234 ACAGTAAGAG TTTACAAGAA AT .AAATCTA TATTTTTGTG AAGGGTAGTG 
Consensus ACAGTAAGAG TTTACAAGAA AT -AAATCTA TATTTTTGTG AAGGGTAGTG 

901 950 
Mgenl073 GTAcTATACT GTAGATTTCA GTAGTTTCTA AGTCTGTTAT TGTTTTGTTA 
Hapol234 GTAtTATACT GTAGATTTCA GTAGTTTCTA AGTCTGTTAT TGTTTTGTTA 
Consensus GTA-TATACT GTAGATTTCA GTAGTTTCTA AGTCTGTTAT TGTTTTGTTA 

951 1000 
Mgenl073 ACAATGGCAG GTTTTACACG TCTATGCAAT TGTACAAAAA AGTTAaAAGA 
Hapol234 ACAATGGCAG GTTTTACACG TCTATGCAAT TGTACAAAAA AGTTAtAAGA 
Consensus ACAATGGCAG GTTTTACACG TCTATGCAAT TGTACAAAAA AGTTA-AAGA 



Mgenl073 
Hapol234 
Consensus 



Mgenl073 
Hapol234 
Consensus 



1001 

AA* . . ACATG 
AAactACATG 
AA ACATG 

1051 

GGAACGCATT 
GGAACGCATT 
GGAACGCATT 



TAAAATCTTG 
TAAAATCTTG 
TAAAATCTTG 



TTGGGTTGTT 
TTGGGTTGTT 
TTGGGTTGTT 



1050 

ATAGCTAAAT AACTTGCCAT TTCTTTATAT 
ATAGCTAAAT AACTTGCCAT TTCTTTATAT 
ATAGCTAAAT AACTTGCCAT TTCTTTATAT 
begin inversion@1100 

1100 

TAAAAATTTA TAACAGTTAT AAAGAAAGAt 
TAAAAATTTA TAACAGTTAT AAAGAAAGAa 
TAAAAATTTA TAACAGTTAT AAAGAAAGA- 



1101 1150 
Mgenl073 TgtaAActaA Agtgtgcttt AtAAAAaAAg ttgtTtataA AaacccctAa 

Hapol234 TtatAAaggA A aa AgAAAAtAAc gcaaTggacA AgtggtgaAg 

Consensus T— AA— A A A-AAAA-AA T A A A- 

1151 1200 
Mgenl073 acaaacACaC AcGcacaCAC AcacAcacac AcacaCaCAc AcaCAcAcTG 
Hapol234 ctgtgaACtC AgGtgtgCAC AattAtcagg AacacCcCAa AacCAaAgTG 
Consensus AC-C A-G CAC A A A C-CA- A — CA-A-TG 



1201 1243 
Mgenl073 AGGcAGcAca ttgtTttGcA ttacTtTagc gTGTatcaTA t. . 
Hapol234 AGGtAGaAat agcaTgaGaA gccgTgTttg aTGTtaatTA att 
Consensus AGG-AG-A T--G-A T-T -TGT TA 



Figure H (cont'd) 
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Figure 12A 



-70 bp from 5' end of apo-4 



Inr = GCCC TCAT TCTG GAGAC 
apo-4 = GCGG TGAT GGTG GCAGT - 48% perfect homology with Inr 

71% match on type of base 
(purine vs. pyrimidine) 




Figure 13 
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Figure 17A 
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12/23bp spacer 

CACAGT G ACAAAAACC 

heptamer nonamer 



Figure 18A 



B. 



11640 
* 



11650 



inversion breakpoint i 

11660 | 11670 



11680 



dystrophin T TTATAACAGT TAT AAAGAAA GA A TT GTAAAC TAAAGTGTGC 
A AATATTGTCA ATATTTCTTT CT A AACATTTG ATTTCACACG 

a 

apo-4 cDNA 840 850 I 870 

[ 138 ] T TTATAACAGT TAT AAAGAAA GA A TTaTAAAg gAAAaaGaaa> 

A AAAAAAAAAA AAAAAAAAAA AA AA-yAAAA.^ yA A A yyA y VV 

dystrophin T TTATAACAGT TATAAAGAAA GA A TTGTAAAC TAAAGTGTGC 



11690 11700 11710 11720 11730 

* * * * * 

dystrophin TTTATAAAAA AAAGTTGTTT ATAAAAACCC CTAAAAAC AA AACAAACACA 
AAATATTTTT TTTCAACAAA TATTTTTGGG GATTTTTGTT TTGTTTGTGT 
apo-4 cDNA 880 890 900 910 920 930 

[ 138 ] aTaAaAtggA cAAGTgGTga ATgtgAACtC aggtgtgCAc AAttAtCAgg> 

v A v A v A vvv A v AAAA v AA w AA vw AAA v A vvvwvv AA v AA vv A v AA vv 
dystrophin TTTATAAAAA AAAGTTGTTT ATAAAAACCC CTAAAAACAA AACAAACACA 



11740 



11750 
* 



dystrophin CACACACACA CA TACACACA 
GTGTGTGTGT GTATGTGTGT 
apo-4 cDNA 940 950 
[ 138 ] aACAC-CcCA -AaAC-CAaA> 



dystrophin CACACACACA CATACACACA 



Figure 18B 



inversion breakpoint2 

13130 13140 13150 13160 | 13170 

* * * * | * 

dystrophin AATTAGCTTT TGGAGAGT GG GTTTTGTCCA T TATTAATAA TTAATTAATT 
TTAATCGAAA ACCTCTCACC CAAAACAGGT AATAATTATT AATTAATTAA 



990 

apo-4 <AATTAATT 



dystrophin AATTAATT 



13180 13190 13200 13210 13220 

***** 

dystrophin AACATCAAAC ACGGCTTCTC ATGCTATTTC TACCTCACTT TGGTTTTGGG 
T TGTAGTTTG TGC CGAAGAG TACGATAAAG ATGGAGTGAA ACCAAAACCC 

980 970 960 950 940 

apo-4 <AACATCAAAC ACGGCTTCTC ATGCTATTTC TACCTCACTT TGGTTTTGGG 



dystrophin AACATCAAAC ACGGCTTCTC ATGCTATTTC TACCTCACTT TGGTTTTGGG 



13230 13240 13250 13260 13270 

***** 

dystrophin GTGTTCCTGA TAATTGTGCA CACCTGAGTT CACAGCTTCA CCACTTGTCC 
CACAAGGACT ATTAACACGT GTGGACTCAA GTGTCGAAGT GGTGAACAGG 

930 920 910 900 890 

apo-4 <GTGTTCCTGA TAATTGTGCA CACCTGAGTT CACAGCTTCA CCACTTGTCC 



dystrophin GTGTTCCTGA TAATTGTGCA CACCTGAGTT CACAGCTTCA CCACTTGTCC 



13280 13290 13300 13310 13320 

***** 

dystrophin ATTGCGTTAT TTTCTTTTTC CTTTATAA TT CTTTCTTT TT CCTTCATAAT 
TAACGCAATA AAAGAAAAAG G AAATATTAA GAAAGAAAAA GGAAGTATTA 

I 

inversion breakpoints 

880 870 860 850 840 

apo-4 <ATTGCGTTAT TTTCTTTTTC CTTTATAAT T CTTTCTTT aT aacTgtTAta 

"""""""""" - ~ ~ ~ - ~ ~ ~ ~ ~ " ~ ~ ~ v" vw~w AA vv 

dystrophin ATTGCGTTAT TTTCTTTTTC CTTTATAATT CTTTCTTTTT CCTTCATAAT 



Figure 18C 



3' 



F 'gure 180 



inversion @ 860 
TAACA G TTATA& AGAAAGAA TT^ 

+ + - - + + + + 900 

ATTGTC 7\ATATT TCTTTCT 



Figure 19 



Description of In Vitro Transcription and Translation 
of the Apo-dystrophin cDNA in pBluescript SK+ 

5 ' (Hindffl) exon79 Hpa j mversiori PstI 



©451 A ©859 j 

SSfS* 709 bp Hpa I fragment 287 bp £ Polyl Sr 

(includes four ORFs) Included in full-length 1Kb Pst I fragmirlf* 5 * 6 

| | | (includes me ORF) 

Obp lOObp 200bp 

scale 

Linearize plasmid with either Hpa I (truncated) or Pst I (full length). Gene Clean and incubate with 17 
polymerase and dNTPs to produce RNA in vitro. 

Incubate RNA with Wheat Germ Extract or Rabbit Reticulocyte Lysates to produce in vitro translation 

♦ 

Separate translation products by SDS-PAGE Fix, Amplify and Dry Get Perform Autoradiography 



Figure 20 




Figure 20A 




p22- 




1 2 3 4 5 6 




Figure 23 



123456789 10 11 12 13 14 15 16 17 18 




K562 Apo-4 TF Sham 



Figure 24 



12 3 4 




NR R 



Figure 25A 




H2 starting at second methionine - 321 bp, predicted weight = 17.4Kd + 1 N-glycosylation 
site + 20.4 Kd. 



Figure 26A 



Splice sites for peptide 

NfYTIMEYSCSD RNLVLIYGlLL^ 

YFCEGFYTSMQLYKKVIRKLHKITQWTRTPQNQSEVEIA 107 



Figure 26B 



Start 


Exon No. 


Exon 
Position 


Exon 
Length 


Intron 
No. 


Intron 
Position 


Intron 
Length 


@88 bp 


78.3 


@74-180 


106 bp 


79.1 


@181-529 


349 bp 




79.1 


@53 0-654 


125 bp 


79.4 


@655-720 


66 bp 




79.4 


@72 1-769 


49 bp 


79.55 


@770-875 


105 bp 




79.55 


@876-893 


18 bp 


79.75 


@894-932 


39 bp 




79.85 


@933- 
966 


33 bp 









Hydrophobicity Scale KD; Candidate membrane-spanning segments: 



Certain 1 12- 32 1.8833 



Figure 26C 



- Predicted TM structure 

> : Too long to be significative 
< : Too short to be significative 
LI : Loop length 
KR : Number of Lys and Arg 



KR Diff : Positive charge difference 

CE ; Net charge energy 

CE Diff : Net charge difference 

CH Diff : Charge difference over N-term segments 




KRDiff= 1 CYTOPLASM 
ch Diff = -3 OUTSIDE 
CE Diff = 0.54 OUTSIDE 



Structure no. 1 



Figure 26D 



A readthrough apo-4S product using the second available methionine 



The Apo-4S peptide sequence 

PI Begin TMi (R) 

+30 | P2 

MYPIMEYSCSD RNLVLIYGIL LIYIYIGNLN VA RHFSMKTP VARSNIKLIL 80 

TNNVKWLHKK GFASSWKLVK NOTLLCTPSM QLLCCLHPEM GN DFPNGKET 130 

P3 

ERCYLSAPFV KSVFLSLCFP GHNVGSLFHM ADDLGRAMES LVSVMTDEEG 180 



AEKMFYNSRF PHGFYNIHTT KRIRQKEFTR NK5IFL RRW VLYCRFOKFL 230 
SLLLFCKO WO VLHVYAIVQK SYK KTTCKIL IAKKLAISLY GTHFG LFKNL 280 
KQLKRKNYKG KRKKRNGQW KLRTQVCTII RNTPKPKRGR NSMRSRVRCK 330 
LI 332 (302aa in predicted polypeptide) 

Figure 27 A 



Candidate membrane-spanning segments: 



Certain 


1 


41-61 1.9073 


Putative 


2 


101-121 


0.8052 


Certain 


3 


132-152 


1.2552 


Putative 


4 


217-237 


1.1833 


Putative 


5 


254-274 


0.9240 



Transmembrane segments included in structure No. 8: 1 2 3 4 5 

Loop lengths; 11 39 10 64 16 58; K+R profile: 1 2 5 (9 >22) 

K-f-R difference: -23: -> Orientation: N-out 

Charge-difference over N-terminal Membr. segs. (±15 residues): -4 
-> Orientation: N-out 

CYT-EXT profile (neg. values indicate cytoplasmic preference): < -0. 13 < 
CYT-EXT difference: 0.13: -> Orientation: N-out 

Figure 27B 



> : ioo long to be significative KR Diff : Positive charge difference 

< : Too short to be significative CE : Net charge energy 

L! : loop lenglh CE Diff : Net charge difference 

KR : Number of Lys and Arg CH Diff : Charge difference over N-term segments 

CE=< CE = -0.13 CE=< 

KR=9 KR=> KR =22 

LL = 39 LL = 64 LL =58 

LL = 11 LL = 10 LL = 16 

KR = 1 KR = 2 KR = 5 

CE=< CE=< CE=< 

KRDiff= -23 OUTSIDE 

CHDiff= -4 OUTSIDE Structure no. 8 

CEDiff= 0.13 OUTSIDE 



Figure 27C 




Figure 28 
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Figure 28 (cont'd) 




Figure 29 (cont'd) 




Figure 31 
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Additional Oligonucleotide primers used for apo-dystrophin-4 
southern blotting and sequencing 



FORWARD 



GTT CGT TAA TAC AAG TAG 

GCC AAG GTG GAA AAG ATG 

CCA GTA GCC TGA TCC AAC 

GGC TTC ATT AAT AAG 

GGC AAA GAA ACA GAG TG 

CAG GAC ACA ATG TAG GA 

GTT ATA AAG AAA GAA TTA TAA AG 

GAA AAT AAC GCA ATG GAC 

REVERSE 

GAT GGG ATA CAT CTT TTC C 
CAA GCT ACA TTC AGG TTC CC 
GGA CTC CAT CGC TCT GCC 
GAC TTA GAA ACT ACT G 
ATA GAC GTG TAA AAC CTG C 
AAC TGT TAT AAA TTT TTA 
CTT TTT CCT TTA TAA TTC TTT C 



F2.3(@28) 

F2.2(@73) 

F3.2(@208) 

F3.1(@257) 

F4.2(@379) 

F4.1(@449) 

FJ n(@846) 

F5.1(@875) 



R6.1(@99) 

F2.2R(@188) 

R4.1(@510) 

R3.4(@694) 

R2.1(@735) 

RSP2(@848) 

R2.3 0 (@875) 



(SEQ ID NO 15) 
(SEQ ID NO 16) 
(SEQ ID NO 17) 
(SEQ ID NO 18) 
(SEQ ID NO 19) 
(SEQ ID NO 20) 
(SEQ ID NO 21) 

(SEQ ID NO 22) 



(SEQ ID NO 23) 
(SEQ ID NO 24) 
(SEQ ID NO 25) 
(SEQ ID NO 26) 
(SEQ ID NO 27) 
(SEQ ID NO 28) 
(SEQ ID NO 29) 



Figure 34 



An Additional Splice Product Predicted From The Apo-4 Gene 

A second potential theoretical splice product which retains exon 78.3 is shown below. 

H2 pl-124 spliced product =351 bp, 117 amino acids + 10 from vector + 1 N-glycosylation 
site; predicted weight = 21.9 Kd 

Figure 35A 



Peptide Generated 

MFVNTTKWJKMYPTMEYSC^ 

FIQQRGLDSKSLQEINLYFCEGFYTSMQLYKKVIRKLHKITQWTRTPQNQSEVEIA (117 
amino acids) (SEQ ID NO 30) 

Figure 35B 



Start 


Exon No. 


Exon 
Position 


Exon 
Length 


Intron 
No. 


Intron 
Position 


Intron 
Length 


@26 bp 


78.1 


@16-41 


26 bp 


78.3 


@42-74 


33 bp 




78.3 


@75-181 


106 bp 


79.1 


@1 82-530 


349 bp 




79.1 


@53 1-655 


125 bp 


79.4 


@656-721 


66 bp 




79.4 


@722-770 


49 bp 


79.55 


@771-876 


105 bp 




79.55 


@877-894 


18 bp 


79.75 


@895-933 


39 bp 




79.85 


@934- 
967 


33 bp 









Hydrophobicity Scale KD; Candidate membrane-spanning segments: 



Certain 1 22- 42 1.8833 

Figure 35C 



rmliclcri TM structure 

>: loo long to be significative 
< : loo short to be significative 
U: loop length 
KR : Number of tys and Aig 



KR Dtff : Positive charge difference 

CE : Nel charge energy 

CE Diff : Net chafge difference 

CH Dtff : Charge difference over N-term segments 




KRDifU 3 
CHOW* -2 
CEDi(f= 0.54 



CYTOPLASM 
OUTSIDE 
OUTSIDE 



Structure no. 1 



Figure 35D 



Nucleic Acid Subsequence Sites Identified In Apo-4 



Motif 

CpG 

CAAT 

TATAAT (5/6) 
TATA 



Position 

-7, (+28, +106) 

-132, (+127, +131) 

-120, -114, (+10) 

-154 



CCATTCA -162,-131 
TATCAGT +12, (+25) 

TGGCTGCAAGCCCAA (10/14) -57, (+41) 
GTGATGG -140, -4, +11, +32 



Significance 
DNA methylation site 
Binding of CAAT factors 
TFUD Binding site 
Binds RNA polymerase II 
and TFUD 
Cap Site I 
Cap Site H 

Binds CTF/NF-I protein 
Eucaryotic Transcription 
Initiation Site 



Figure 36 



Top Pred predicts 4-5 transmembrane domains for a full-length 
apo-4F product in which all the stop codons are suppressed. 



Protein sequence and position of predicted TM domains 

Begin TMi (R) 
PI I P2 

MFVNTSREKV INQSLIAKVE K MYPIMEYSCSD RNLVLIYGIL LIYIYIGNLN VA RHFSMK6Q 

TPVARSNIKL I LTNNVKWLH KKGFAS SWKL VK NQTLLCTP SMQLLCCLHP EMG NDFPNGK 120 

P3 

ETERCYLSAP FVKSVFLSLC FPGHNVGSLF HMA DDLGRftM ESLVSVMTDE E GAEKMFYN S 1 8 0 



RFPHGFYNIH TTKRIRQKEF TRNKSIFL RR WVLYCRFQK FLSLLLFCK Q WQVLHVYAIV 240 
QKSY KKTTCK ILIAKKLAIS LYGTHF GLFK NLKQLKRKNY KGKRKKRNGQ WKLRTQVCT 300 
IIRNTPKPKR GRNSMRSRVR CKLI (324 amino acids) (SEQ ID NO 31) 



Hydrophobicity Scale KD 

Figure 37A 



Apo-4F : Candidate membrane-spanning segments: 



Certain 


1 


33- 


53 1.9073 


Putative 


2 


93- 


113 0.8052 


Certain 


3 


124- 


144 1.2552 


Putative 


4 


209- 


-229 1.1833 


Putative 


5 


246- 


-266 0.9240 



I. Transmembrane segments included in structure 8: 1 2 3 4 5; Loop lengths: 32 39 10 64 
16 58 



Figure 37B 



K+R difference: -19; -> Orientation: N-out; Charge-difference over N-terminal Membr. segs. 
(±15 residues): -3; -> Orientation: N-out 

CYT-EXT profile (neg. values indicate cytoplasmic preference): < < < < -0. 13 < 

CYT-EXT difference: 0.13 
-> Orientation: N-out 

II Transmembrane segments included in structure 7: 1 3 4 5; Loop lengths: 32 70 64 16 
58 

K+R profile: 5 > 22 > 5; K+R difference: 22 -> Orientation: N-in 
Charge-difference over N-terminal Membr. segs. (±15 residues): -3; -> Orientation: N-out 
CYT-EXT profile (neg. values indicate cytoplasmic preference): < -0.13 < -0.26 < 
CYT-EXT difference: 0. 13 ; -> Orientation: N-out 

Figure 37B (cont'd) 



TopPred predicts a cytoplasmic N-terminus for four TM domains 



> : Too long 1o be significative 
< : Too short to be significative 
U: Loop length 
KR : Number of Lys and Arg 



KR Diff : Positive charge difference 

CE : Net charge energy 

CE Diff : Net charge difference 

CH Diff : Charge difference over N-term segments 



CE = -0 26 CE =< 
KR => KR - 5 

LL = 70 LL = 16 




LL = 32 
KR = 5 
CE=< 



LL =64 
KR=> 



LL = 58 
KR =22 



CE=-0.13 CE=< 



KRDiff^ 22 CYTOPLASM 
CHDiff= -3 OUTSIDE 
ce Diff = a 13 OUTSIDE 



Structure no. 7 



Figure 37C 



Basic Features of a Transposon or Retrovirus 
LTR 




BIB[250-600bp 


mobile genes/ virion 


250-600bp K&Mii 




gag pol 


env 





Target site i0-50bp 

5-10bp Inverted repeat 

direct repeat (jr) 
(DR) 



3' 



Figure 38A 



Structure of the apo-4 inversion element before rearrangement 



1 




9bp DR 12bpIR 12bpIR 6bpIR9bpDR 



Figure 38B 



Deleted sequences 



^Exon 2 



Exon 3 



5" 



viral genes 



RNA transcript is promoted from cell sequences but enhanced 
and terminated by viral sequences. 



Figure 39A 




RNA transcript is promoted from cell sequences but enhanced 
and terminated by inversion sequences which may also 
activate suppressor tRNAs or reverse transcriptase activity to 
prevent the recognition of stop codons. Inverted repeats (IR) 
are present at both ends of the inversion, as they are in 
retroviruses and transposable elements. 



Figure 39B 



